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Abstract 

Background: 5-metliylcytosine (mC) can be oxidized by tine tet metliylcytosine dioxygenase (Tet) family of 
enzymes to S^liydroxymetliylcytosine (limC), wliicln is an intermediate of mC demetliylation and may also be a 
stable epigenetic modification that influences chromatin structure. hmC is particularly abundant in mammalian 
brains but its function is currently unknown. A high-resolution hydroxymethylome map is required to fully 
understand the function of hmC in the human brain. 

Results: We present genome-wide and single-base resolution maps of hmC and mC in the human brain by combined 
application of Tet-assisted bisulfite sequencing and bisulfite sequencing. We demonstrate that hmCs increase markedly 
from the fetal to the adult stage, and in the adult brain, 13% of all CpGs are highly hydroxy methylated with strong 
enrichment at genie regions and distal regulatory elements. Notably, hmC peaks are identified at the 5'splicing sites at 
the exon-intron boundary, suggesting a mechanistic link between hmC and splicing. We report a surprising transcription- 
correlated hmC bias toward the sense strand and an mC bias toward the antisense strand of gene bodies. Furthermore, 
hmC is negatively correlated with H3K27me3-marked and H3K9me3-marked repressive genomic regions, and is more 
enriched at poised enhancers than active enhancers. 

Conclusions: We provide single-base resolution hmC and mC maps in the human brain and our data imply novel roles 
of hmC in regulating splicing and gene expression. Hydroxymethylation is the main modification status for a large portion 
of CpGs situated at poised enhancers and actively transcribed regions, suggesting its roles in epigenetic tuning at 
these regions. 



Background 

Methylation of cytosine (mC) plays a role in many crucial 
cellular processes. In 2009, it was shown that mC can be 
oxidized to 5-hydroxymethylcytosine (hmC) by tet methyl- 
cytosine dioxygenase (Tet) family of enzyme, and that em- 
bryonic stem cells (ESCs) and mouse brain tissues contain 
high levels of hmC [1,2]. These and subsequent findings 
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evidently suggested that hmC is an intermediate in the 
long pursued pathway of active DNA demethylation [3,4]. 

Soon after the discovery, a series of genome-wide map- 
ping studies of hmC were performed using affinity or 
enzyme-based approaches by us and others [5-16]. These 
studies, at low resolution and semi-quantitative, have pro- 
vided significant insights into the distribution and func- 
tions of hmC at distal regulatory elements, gene bodies, 
and polycomb repression complex-bound promoters. 
More recently, two bisulfite-sequencing (BS-Seq) derived 
methods, Tet-assisted bisulfite sequencing (TAB-Seq) and 
oxidative bisulfite sequencing (oxBS-Seq), were established 
to quantitatively sequence hmC at base resolution [17,18]. 
The first genome-wide application of TAB-Seq to mam- 
malian ESCs revealed novel characters of hmC such as its 
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deposition around, but not within, transcription factor 
binding sites [17]. 

The content of hmC in the mammaUan brain is typic- 
ally five to ten times higher than in any other tissues, 
suggesting a potential role for hmC in the brain [2,8,19]. 
hmC could be an intermediate of mC demethylation, 
suggesting a potential high turnover rate of DNA methy- 
lation in the brain [20]. In addition, given its high abun- 
dance and stability, hmC could act as an epigenetic 
modification that influences genome structure and func- 
tion by recruiting chromatin modifiers [21,22]. Recent 
studies have shown that Tetl mutant mice exhibit mem- 
ory defects, suggesting that DNA hydroxymethylation 
plays an important role in normal brain function [23,24]. 

However, the exact function of hmC in the mammalian 
brain remains to be understood. Recently, Lister et al. re- 
ported comprehensive genome-wide DNA methylation 
maps in the human and mouse brain using BS-Seq, which 
also include hmC maps in the mouse brain using TAB- 
Seq [25]. Here, we applied TAB-Seq combined with BS- 
Seq to map the DNA hydroxymethylome and methylome 
at single-base resolution in the human brain. Our data un- 
covered new features of hmC including hmC peaks at 5 ' 
splicing sites and a transcription-corrected hmC bias to- 
ward the sense strand of gene bodies, implying novel roles 
of hmC in regulating splicing and gene expression in the 
brain. 

Results and discussion 

Base-resolution hydroxymethylome and methylome 
mapping in the human brain and identification of 
highly hydroxymethylated cytosines 

We performed TAB-Seq and BS-Seq on a DNA sample 
isolated from the prefrontal cortex of an adult individual 
and sequenced it to an average depth of 22 x per strand by 
TAB-Seq and 9.3x by BS-Seq. For TAB-Seq, we observed 
a low non-conversion rate of unmodified cytosine (0.36%) 
and mC (1.18%), and a high protection rate of hmC 
(97.6%). We also applied TAB-Seq to a DNA sample iso- 
lated from the prefrontal cortex of a fetal brain and se- 
quenced it to an average depth of llx per strand, with the 
non-conversion rates of unmodified cytosine and mC be- 
ing 0.25% and 1.51%, respectively. The sequencing details 
are summarized in Additional file 1. TAB-Seq detected ap- 
proximately 28.4 million hmCs in the adult prefrontal cor- 
tex, 10-fold more than that in the fetal prefrontal cortex 
(approximately 2.6 million), and BS-Seq detected approxi- 
mately 49.9 million modified cytosines (modCs) (false dis- 
covery rate <1%; Figure lA, and see Materials and 
methods). The much higher number of hmCs detected in 
the adult prefrontal cortex is consistent with previous re- 
ports [8,9]. To quantitatively verify this, we performed li- 
quid chromatography-tandem mass spectrometry (LC- 
MS/MS) to genomic DNAs isolated from several regions 



of these two brain samples, as well as another pair of fetal 
and adult brain samples, and the results confirmed that 
the abundance of hmC in the adult human brain is nearly 
six times higher than that in the fetal brain (%hmC/dC 
average 0.866 vs 0.154) (Additional files 2 and 3). Whereas 
a notable portion of modC exists in non-CpG contexts in 
the adult prefrontal cortex (16.1% in CHH and 2.8% in 
CHG, where H = A, C or T), hmC exists predominantly in 
CpG context (97.4% in the adult cortex and 99.86% in the 
fetal cortex), which is in line with the finding in mouse 
[25] (Figure la). We also applied BS-Seq and TAB-Seq to 
a DNA sample extracted from the hippocampus of the 
fetal brain, and found that both modC and hmC exists 
predominantly in CpG context, confirming the previous 
finding that the non-CpG modification in the brain is 
adult-specific [25] (Additional file 4). Since the majority of 
hmC exists in a CpG context, we next focused our analysis 
on CpG modifications. 

Examples of the hmC and mC maps were shown in 
Figure lb. A genomic region of 12 mb (mega base pairs) 
on chromosome 4 displays increased hmC levels in gene- 
rich regions, coincident with a decrease in mC levels; 
these changes could not be distinguished by traditional bi- 
sulfite sequencing alone because the modC levels mea- 
sured by BS-Seq change only slightly. Closer inspection of 
a 12 kb (kilo base pairs) region around the TET2 tran- 
scription start site (TSS) further supports the need for 
TAB-Seq. For many individual CpG sites, the hmC level is 
higher than the mC level. In particular, the highly modified 
CpGs immediately upstream the TSS are highly hydroxy- 
methylated and could be mistaken as highly methylated 
sites when applying traditional BS-Seq alone. These results 
highlight the need to apply TAB-Seq to distinguish hmC 
and mC unequivocally for obtaining accurate hydroxy- 
methylation and methylation maps in the brain. The hmC 
map of the fetal brain is also provided and suggested that 
patterns of hmCs in the adult brain have been partially 
established in the fetal stage. 

We calculated the modification frequency of hmC and 
mC for individual CpG sites. In general, hmCs occur at 
relatively low frequency (median 29.2% and 30.8% in the 
adult and fetal brains, respectively), whereas most mCs 
occur at high frequency (median 59.7% in the adult brain) 
(Additional file 5a and b). Then we classified CpG sites ac- 
cording to their hmC and mC frequencies. We first divided 
all CpGs into three categories depending on the total modi- 
fication (hmC + mC, or modC). In agreement with previous 
studies [26], we determined that 89.8% of the CpGs are 
highly modified (modC'^sh^ modC > 50%), 6.9% are un- 
modified (modC"° mode < 10%), and 3.3% {n = 1,423,006) 
are modified at low levels (modC'°™, 10% < modC < 50%) 
(Figure 2a). Next, we divided the modC*^'**^ into three sub- 
groups: (1) hmC'^^'^: hmC > mC; (2) hmC'°"': hmC < mC; 
and (3) hmC"°: no hmC. We determined that hmC*^^ 



Wen ef al. Genome Biology 2014, 15:R49 
http://genomebiology.com/201 4/1 5/3/R49 



Page 3 of 1 7 



(a) 



hmCG ■ hmCHH ■ hmCHG 

0.4»/ir°''° 
hmCG = 27.7 million 



I modCG ■ modCHH ■ modCHG 



hmCHH=0.1% 
hmCHG=0.04% 





(b) 



modCG 
(adult) 



hmCG 
(adult) 



mCG 
(adult) 

Genes 



Adult brain/TAB-Seq 



1 mb 



Fetal brain/TAB-Seq 



Adult brain/BS-Seq 



chr4:95,280,000-1 07,500,000 




'DUM5 BMPR1B PDHA2 




modCG 0 
(adult) 



hmCG 
(adult) 



mCG 
(adult) 



hnnCG n 
(fetal) 



III 






1 


II III 


1 

1 




II 

1 


mil 








II III 

1 III 




1 


! II " ' 








iiiiii i n 




1 t'l III 


\\ \i 


III. 


1 

1 


1 


' ml 




1 1 
' 1 


1' i'' 







Figure 1 Base-resolution hydroxymethylome and methylome in the human brain, (a) The percentages of hmCs or modCs in the adult or 
the fetal brain in the contexts of CG, CHH, and CHG. (b) Examples of the hmC, mC, and total modification (hmC + mC) profiles are shown for a 
genomic region of 12 mb on chromosome 4 as a scatterplot (Upper panel) and for a 12 kb region surrounding the TSS of the TET2 gene as a bar 
chart. The green box indicates the CpG island located in the TET2 promoter. 



(« = 5,692,354) accounts for a notable proportion (13.4%) of 
all captured CpGs. The hmC'°™ (« = 20,857,810) and 
hmC"° (« = 11,695,002) categories comprised 49% and 
27.4%, respectively. Since further analysis revealed that 
hmC'°" and hmC"° have similar characters, they were 
grouped into mC*^^*^ referring to highly-methylated cyto- 
sines (Figure 2a). 

Genomic distribution of hmC in the human brain 

Next, we determined the genomic distribution of modC'°"', 
hmC'^'^'^, and mC*^^*^. In addition to the annotated genomic 
features, we also analyzed enhancers mapped by a recently 
published ChlP-Seq dataset in the human brain [27]. Two 
types of enhancers were distinguished: active enhancers 
that were simultaneously marked by distal H3K4mel and 



H3K27ac, and poised enhancers that were solely marked 
by distal H3K4mel [28]. modC'°"' is prominently enriched 
at enhancers (Figure 2b), which is in accordance with pre- 
vious bisulfite studies [26]. We found that hmC*^^*^ is also 
highly enriched at enhancers. In contrast to more enrich- 
ment of modC'°™ at active enhancers, hmC*"^ is more 
enriched at poised enhancers (Figure 2b). Furthermore, 
hmC^^ is abundant at introns and exons. Most hmC'"'^'" 
(56.9%) occur at genie regions (Additional file 5c), and this 
is distinct from hmC distribution in ESCs [17]. By contrast, 
mC*^'^'^ are not enriched at enhancers or genie regions. We 
calculated the hmC and mC levels of these genomic ele- 
ments (Figure 2c). We observed that poised enhancers 
have the highest hmC level (32.6% on average), followed by 
active enhancers (28.6%), introns (27.8%), exons (27.7%), 
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Figure 2 Features of hydroxymethylome in the human brain, (a) Classification of all CpGs in the adult brain according to their 
hydroxymethylation and methylation frequencies, (b) Fold enrichment of the CpG categories on different genomic elements. hmC'^'^'^ is enriched 
in enhancers, exons and introns. (c) The absolute hmC and mC levels at different genomic elements in the adult brain, (d) Fold enrichment of 
the Fetal > Adult hmCGs, which exhibited higher hmC levels in the fetal brain than in the adult brain, on different genomic elements. 



and intergenic regions (23.6%). The mC levels follow the 
reverse order, with the intergenic regions (58.2%) being 
highest, followed by exons (57.4%), introns (56.4%), and en- 
hancers (42.1% and 37.6% for poised and active enhancers, 
respectively). The promoter regions have both the lowest 
hmC and mC levels. Repetitive sequences, including LINE, 
SINE, LTR, and major satellite repeats, generally have 
lower hmC and higher mC levels than the non-repetitive 
regions, and the major satellite repeats have the lowest 
hmC level (5.4%). 

We determined that the hmC levels of the fetal brain 
are much lower than those of the adult brain in all gen- 
omic regions (Additional file 5d). Despite this, we identi- 
fied 102,569 CpGs that exhibited higher hmC levels in the 
fetal brain than in the adult brain (both coverage >10, 
methylation difference >0.3). Analysis for the genomic dis- 
tribution of these Fetal > Adult hmCGs showed that they 
are highly enriched at enhancers, but not at other genomic 
features (Figure 2d). The result suggested that hmC 
changes at enhancers are bidirectional with a subset of en- 
hancers gaining and some losing hydroxymethylation 



from the fetal to the adult stage, which is consistent with 
the previous report that hmC marks regulatory elements 
in the fetal brain that is poised for subsequent activation 
in the adult brain [25]. 

Prominent hmC peaks mark the exon-lntron boundary 

Despite previous suggestions that hmC may be associ- 
ated with the exon-intron boundary in the brain [14], 
the low-resolution method employed previously pre- 
vented accurate analysis at the base resolution. Con- 
versely, whole-genome BS-Seq studies have shown that 
DNA modification is abundant on exons [29-31]. These 
studies, however, were not able to distinguish between 
mC and hmC. One intriguing finding of our analyses of 
the base-resolution hmC and mC maps is two striking 
hmC peaks at the 5' splicing sites (5'ss), which are posi- 
tioned flanking the highly conserved 'GT' sequence, with a 
higher peak siting at positions -1 and -2 on the exon side 
and a lower one at positions +4 and +5 on the intron side 
(all internal exons, n = 176,455) (Figure 3a and Additional 
file 6). In contrast, mC is not enriched. Notably, analysis 
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Figure 3 (See legend on next page.) 
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(See figure on previous page.) 

Figure 3 Prominent hmC changes mark the exon-intron boundary, (a, b) Profiles of InmC and mC for a 40-bp window around tlie exon-intron 
and intron-exon boundaries at single-nucleotide resolution in liuman (a) and mouse (b). Modification levels of hmC, mC, total DNA methylation (hmC + hmC), 
and the CpG number are shown for all internal exons in the sense strand. The sequences ± 9 bp around the 5' and 3' splicing sites are also indicated. The 
TAB-Seq and BS-Seq data to generate the mouse profile (b) were obtained from Lister ef al. [25]. (c) Profiles of hmC and mC at the exon-intron boundary of 
exons which have a CpG at 5'ss position -2 (-2CG, n = 8,212) or -1 (-1 CG, n = 4,277). Since a CpG at one position will lead to absence of CpG at the nearest 
neighboring position and thus no methylation value, we merged the data of the sense and the antisense strands for each t/pe of exons. (d) Profiles of hmC 
and mC across exons. All internal exons were divided into 100 bins, and average hmC and mC levels were calculated for each bin, as well as ±150 bp 
surrounding the exon. (e) The hmC and mC profiles at the exon-intron boundary of the first exon. 



of the recently published TAB-Seq (GSMl 173795) and 
BS-Seq (GSM1173783) data generated from the adult 
mouse brain tissue (frontal cortex from 6-week-old male 
mouse brain) [25] revealed two nearly identical hmC 
peaks at the 5'ss, indicating that this pattern is conserved 
in mammals (Figure 3b). The CpGs at both peaks are not 
part of the consensus 5'ss sequence (CAG/GTAAGT). 
However, the peak positions -2 and +4 have higher ten- 
dencies to be CpG sites than the overall exons and introns 
(Figure 3a and b). The profile of CpG distribution has 
been recently reported [32]. We further determined that, 
of the 176,455 examined human exons, 12,934 (7.3%, 4722 
at position -1 and 8,212 at position -2) have CpG sites at 
positions related to the exon-side peak and 5,514 (3.1%, 
4605 at position -i-4 and 909 at position -i-5) have CpG sites 
related to the intron-side peak (Figure 3a). Overall, these 
exons (n = 18,036, 10.2% of all 176,455 human internal 
exons) represent a large proportion of human genes (n = 
9,103, 48.9% of all 18,606 human genes, see Additional file 
7 for 5'ss sequences of these exons and the corresponding 
genes). To exclude the possibility that the hmC peaks are 
artifacts due to higher hmC levels of the exons which have 
a CpG site at the 5'ss than those do not have, we separ- 
ately analyzed the exons which have a CpG site at the 5'ss 
position -2, -1, -t-4 or -i-5, and the results are similar to that 
for all internal exons (Figure 3c and Additional file 8). 
These results together indicated that prominent hmC 
peaks mark the 5'ss at the exon-intron boundary in both 
human and mouse, and are associated with a large cohort 
of internal exons. 

A decrease in hmC at positions +6 and -i-7 directiy fol- 
lowing the intron-side hmC peak was also observed for all 
internal exons (Figure 3a and Additional file 5: Figure S4), 
which seems to be similar to the intronic hmC decrease 
reported previously [14]. In addition to these hmC 
changes at the 5'ss, we also observed a marked increase in 
mC and a less pronounced decrease in hmC from 5 ' to 3 ' 
across the exons (Figure 3c). We also examined the first 
exons (n = 12,980), and found that they have higher CpG 
occurrence and much lower hmC and mC levels than the 
internal exons (Figure 3d), which should be due to their 
proximity to promoter CpG islands. Interestingly, in con- 
trast to the internal exons, mC is strongly enriched, and 
hmC change follows the mC change. The distinction 



between the internal exons and the first exons is suggest- 
ive of a functional connection between the hmC peaks 
and alternative exon inclusion, which should only occur in 
the internal exons. 

To determine whether the observed hmC and mC 
changes are associated with gene expression, we generated 
RNA-Seq data using RNA extracted from the same adult 
brain sample for TAB-Seq and BS-Seq and divided the 
expressed genes (RPKM >0.1) into three groups of identical 
size of high, middle and low expression levels (Additional 
file 9). The genes within these three groups, as well as 
genes not expressed, displayed similar hmC changes 
around their exon-intron boundary. This was unlike overall 
hmC levels within both exons and introns, which positively 
correlates to gene expression levels (Figure 4a). In addition, 
the hmC decrease and the mC increase across the exon are 
also similar among genes transcribed at different levels 
(Figure 4b). These results suggested that the hmC and mC 
changes occur irrespective of the transcription levels of the 
corresponding genes. 

Next, we address the potential functional relationship 
between the DNA modification at the exon-intron bound- 
ary and the splicing. We classified the exons having a CpG 
site at the 5 'ss position -2, -1, -i-4 or -i-5 into four types ac- 
cording to the DNA modification states of the CpG site 
(that is, mC'^'^, hm&'^\ modC'""', or modC"°) and com- 
pared their inclusion rates (see Materials and methods). 
We also examined the number of the alternatively spliced 
(AS) exon, which was defined as the inclusion rate being 
less than 0.8. The results demonstrated that the hmC'"^'^ 
and mC'^'^'^ exons comprised a similar fraction of AS 
exons (6.04% (128 out of 2,119 exons) vs. 5.63% (367 out 
of 6,522 exons). Figure 4c), with the inclusion rates being 
no difference (median 0.9901 vs. 0.9908, P = 0.13, two- 
tailed Mann- Whitney- Wilcoxon (MWW) test. Figure 4d). 
Also no difference was found comparing the hmC'^'^'' and 
j^(-;high gjjQj^g with the internal exons put together (me- 
dian inclusion rate 0.9912, P >0.1, AS exons percent: 
6.76% (6,929 out of 102,474 exons)). These results sug- 
gested that hmC and mC at the 5'ss CpG site are similarly 
related to splicing. Interestingly, the modC'°" and modC"° 
groups involved markedly more AS exons (9.67% (59 out 
of 610 exons) and 18.76% (103 out of 549 exons), respect- 
ively. Figure 4c). In addition, the inclusion rates of the 
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Figure 4 Correlation of the hmC and mC changes at the exon-lntron boundary with gene expression and splicing, (a, b) The hmC 

changes at the exon-intron boundary (a) and across the exon (b) are similar for exons expressed at high (red), middle (green), or low (blue) levels, 
as well as exons with no expression (black), (c, d) The percentages of alternatively spliced exons (c) and the inclusion rates (d) of the exons with 
the status of the 5'ss CpG site being mC'^'^'^, hmC*^'^*^, modC'°", or modC"° showing differential splicing of these exon types. Two-tailed MWW 
test, n, number of exons. 



mode °" and modC"° exons were slightly but significantly 
reduced comparing with the hmC'"«h ^Qhi^h 

groups 

(modC'°" vs. hmC'^'s'^: median 0.9885 vs. 0.9901, P = 0.04; 
modC"° vs. the hmC'''^'': median 0.9859 vs. 0.9901, P = 
2.45 X 10 ^ two-tailed MWW test. Figure 4d). These data 
suggested that complete demethylation of mC and hmC 
to C might lead to exon skipping, which is in line with 
previous studies that DNA methylation facilitates exon 
recognition [32,33]. 

Strand-biased hmC and mC on the gene body 

Previous studies have reported that hmC is enriched on 
gene bodies and positively correlated with gene expres- 
sion in the adult brain [8,9,13,21,25]. We plotted the 
average hmC and mC levels across the genes at high, 
middle, and low expression levels and the data quantita- 
tively confirmed that hmC is markedly enriched on the 
gene body and positively correlated with gene expression 
levels in the human adult brain (Figure 5a, left panel). In 



addition, we also showed that mC levels are clearly nega- 
tively correlated with gene expression (Figure 5a, right 
panel). Surprisingly, we observed that the hmC and mC 
abundances are slightly but significantly different be- 
tween the sense and antisense strands, with enrichment 
of hmC on the sense strand and enrichment of mC on 
the antisense strand. The differences are positively corre- 
lated with gene expression levels as highly expressed 
genes show the strongest biases, whereas genes that are 
not expressed exhibit nearly no difference (Figure 5a). 

To evaluate this pattern quantitatively and at a higher 
level of resolution, all of the expressed genes were di- 
vided into 10 equal-sized groups (approximately 1,400 
genes for each group) based on their expression levels 
and the gene-body mC and hmC levels were calculated 
for each strand of the genes. The results revealed that, 
from the lowest to highest expressed genes, an average 
seven-fold increase in the hmC bias (%hmC of the sense 
strand minus that of the antisense strand, from 0.09% to 
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(See figure on previous page.) 

Figure 5 Strand-biased hmC and mC profiles on the gene body, (a) Profiles of limC (left panel) and mC (right panel) on the sense (lined) 
and antisense (dotted) strands of the genes expressed at high (red), middle (green) and low (blue) expression levels, as well as the genes with no 
expression (black). TSS and TTS indicate the transcription starting site and the transcription terminal site, respectively, (b) Expressed genes were 
divided into 10 groups according to the expression levels (left panel), and the average levels of hmC (middle panel) and mC (right panel) for 
each strand of the gene body were measured. The values for the genes that are not expressed (expression level 0) and randomly selected 
intergenic regions as the control (C) are also shown. One-tailed paired Student's f test nd, no statistical difference (P >0.05). (c) Profiles of hmC 
on the sense (lined) and antisense (dotted) strands of exons with high (red) or no (black) expression, (d) The hmC and mC strand-biases are 
reduced at the sense-antisense gene (SAS) paired regions in comparison with the non-SAS regions, (e) The hmC profile on the sense (lined) and 
antisense (dotted) strands of the genes that are expressed (red) or not expressed (black) in the mouse brain exhibits the transcription-correlated hmC 
bias toward the sense strand similar to the human pattern. The TAB-Seq, BS-Seq, and RNA-Seq data for analysis were obtained from Lister ef ai. [25]. 



0.69%) and an average five-fold increase in the mC bias 
(%mC of the antisense strand minus that of the sense 
strand, from 0.22% to 1.04%) were observed (Figure 5b). 
Also, the strand differences of both hmC and mC are 
significant (for example, P= 1.63x10'^^ for hmC and 
P = 4.4 X 10'^^ for mC in genes of express group 9, two- 
tailed and paired Student's ^-Test, see Additional file 10 
for all calculated P values, as well as the hmC, mC, and 
mode levels on each strand of all individual genes). Both 
hmC and mC also displayed a slight difference in genes 
that are classified as not expressed (n = 8,789), but these 
differences were not statistically significant {P = 0.39 for 
hmC and P = 0.44 for mC). No strand difference was 
found for randomly-selected intergenic regions (Figure 5b). 
Also, for most expression levels, the strand differences of 
the total modification were not significant as measured by 
BS-Seq (Additional file 11), explaining why it has not been 
observed in previous studies using BS-Seq alone. 

Then we plotted the average hmC and mC levels 
across each strand of the exon and the results showed 
that the exon exhibited the strand difference at a similar 
level as that of the gene body, suggesting that the DNA 
methylation strand difference is a general feature of the 
transcribed exons and introns (Figure 5c and Additional 
file 12). To further verify the strand-bias, we analyzed 
the sense-antisense (SAS) gene paired regions. Since 
within these regions, each strand serves as both sense 
and antisense strands, it is expected that the strand dif- 
ference should be lessened. The results indeed showed 
that both the hmC and mC biases decreased in the SAS 
regions, with a 39% reduction in the hmC bias (0.45% in 
non-SAS region vs. 0.28% in SAS region) and a 66% re- 
duction in the mC bias (0.89% in non-SAS region vs. 
0.3% in SAS region. Figure 5d). To generalize our obser- 
vation, we analyzed the recently published TAB-Seq and 
BS-Seq data in the adult mouse brain [25]. The results 
notably showed that the mouse brain cells share a simi- 
lar transcription-associated hmC bias toward the sense 
strand (Figure 5e) and an mC bias toward the antisense 
strand (Additional file 13), indicating that these hmC 
and mC features are evolutionary conserved between 
mouse and human. 



To determine whether the hmC and mC strand biases 
are cell type-specific, we examined genes enriched in 
different brain cell types including neurons (n = 1,417), 
astrocytes (n = 1,807) and oligodendrocytes (« = 1,453) 
[34]. We found that all three gene sets and the house- 
keeping genes (n = 2,402) [35] exhibited enrichment 
of hmC on the sense strand and enrichment of mC on 
the antisense strand (P <0.001, Figure 6a). Furthermore, 
to independently validate our findings, we isolated 
neuronal nuclei by FACS (Figure 6b and Additional file 
14) and employed a Tet-assisted reduced representation 
bisulfite sequencing (TA-RRBS) approach by perform- 
ing TAB-Seq on Mspl-enriched DNA fragments of 
genomic DNAs extracted from these neuronal nuclei, 
and then quantified the hmC levels on both stands of 
the neuron-enriched genes. The results confirmed that 
hmC is significantly enriched on the sense strands of 
the neuronal genes at high expression levels (for ex- 
ample, P=2.4x 10' for expression level 4, Figure 6b). 
The strand difference is not significant for genes 
expressed at low levels {P >0.05 for expression levels 1 
and 2, Figure 6b). Together, our data indicated a 
transcription-correlated hmC bias toward the sense 
strand and an mC bias toward the antisense strand of 
the gene body in both neurons and glia in the human 
and mouse adult brain. 

hmC is enriched at poised enhancers and is negatively 
correlated with repressive histone modifications 

Next, we associated the hmC/mC profiles with the 
ChlP-seq data of various chromatin modification marks 
[27]. At the active enhancer, both hmC and mC levels 
are depleted towards the core region, and hmC is rela- 
tively accumulated at the flanking region. At the poised 
enhancer, both modifications revealed less pronounced 
reduction at the core region, particularly for hmC 
(Figure 7a). The difference in hmC profiles between the 
active and poised enhancers is more strikingly displayed 
by the high proportion of the hmC*^'^*^ subgroup at the 
core region of the poised enhancer (Additional file 15). 
These results are consistent with previous reports in ESCs 
[15,17] and in the mouse brain [25], suggesting that hmC 
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also participates in maintaining a 'poised' chromatin struc- 
ture of enhancers in the human adult brain. 

Lastly, we address the relationship between DNA modifi- 
cation and two repressive histone modifications, H3K27me3 
and H3K9me3. The results demonstrated that hmC is 
clearly depleted at these repressive regions, with the average 
hmC levels of both the H3K27me3 and H3K9me3 regions 
being lower than the overall intergenic regions (20.5% vs. 
23.9% and 14.6% vs. 23.9%, respectively, Figure 7a). By con- 
trast, the mC levels in both of these repressive regions are 
higher than at the overall intergenic regions (approximately 
64.4% vs. 58.2%, Figure 7a). An example of depletion of 
hmC on the genomic regions marked by H3K27me3 and 
H3K9me3 was provided in Additional file 16. It was shown 
recently that brain tissues exist in a unique chromatin state, 
with expansion of H3K27me3 over a particularly large 
proportion of the intergenic regions, accompanied by a dra- 
matic restriction of H3K4mel sites within the transcrip- 
tional units [27]. The association of our hmC/mC profiles 
with these two histone modification maps, as well as 
H3K4me3 peaks at the promoter regions, can be viewed in 
genomic regions surrounding the amyloid precursor protein 
(APP) and Neurexin 1 (NRXNl) gene loci as examples 
(Figure 7b and Additional file 17), which are representative 
of transcribed genes in the adult brain. 

Conclusion 

In this study, we mapped DNA hydroxymethylation and 
methylation across the whole genome at single-base 
resolution in the human brain by combined application 
of TAB-Seq and BS-Seq, which revealed several novel in- 
sights regarding the nature of DNA methylation in brain 
tissues. The first intriguing finding of this study is the 
two striking hmC peaks at the 5 'ss, that is, the sharp in- 
crease in hmC levels at positions -1 and -2 at the exon 
side and at positions +4 and +5 at the intron side sur- 
rounding the highly conserved 'GT' splicing sequence. A 
previous study has revealed enrichment of DNA modifi- 
cation (hmC + mC) at the exon-intron boundary in 
human embryonic stem cells and fibroblast [30]. Since 
BS-Seq alone was applied in this study, mC and hmC were 
not distinguished. In the present study, by the combined 
use of TAB-Seq and BS-Seq, we unexpectedly found that 
it is the hmC that forms peaks at the 5'ss in the adult 
human brain whereas mC is not enriched. This finding 
suggested that hmC is the key modification marking the 
exon-intron boundary in the adult brain tissues. It de- 
serves to be mentioned that this pattern is evolutionary 
conserved since the mouse brain cells share similar hmC 
and mC features at the exon-intron boundary. A recent 
study also revealed significant hmC but not mC changes 
at the exon-intron boundary in the human and mouse 
brain. However, using an enzymatic method which rec- 
ognized hmCs in the context of C'^'^CGG, this study 



only detected the intronic hmC decrease, while the more 
significantly marked hmC peaks were not reported [14]. 
Further studies are needed to elucidate how hmC peaks at 
the 5' splicing sites are mechanically linked to spli- 
cing, which is under regulation by chromatin structures 
[36]. The results may indicate a high turnover rate of 
DNA methylation at the 5'ss, or that the hmC specific- 
ally recruits reader proteins that directly or indirectly 
affect splicing. Interestingly, our analysis suggested that 
both hmC and mC may facilitate exon recognition. This 
may be related with the reports that MeCP2 (methyl- 
CpG-binding protein 2), an extremely abundant protein in 
brain, binds to hmC and mC with similar high affin- 
ities and helps exon recognition [21,33]. 

Second, our data revealed a surprising transcription- 
correlated hmC bias toward the sense strand and an mC 
bias toward the antisense strand of the gene body in the 
adult brain tissues of both human and mouse. In con- 
trast to the well-established role of DNA methylation in 
CpG island promoter regions for transcription repres- 
sion, the function of gene body methylation is still 
largely unknown [37-41]. Previous studies have revealed 
a positive correlation between gene expression and the 
gene-body mC level [39-42]. More recently, we and 
others have shown the gene-body hmC level is also posi- 
tively correlated with gene expression in brain tissues 
and germ line cells [8,9,16,21,25]. While it has been hy- 
pothesized that the function of gene-body methylation is 
to repress spurious initiation of transcription within ac- 
tive genes including that from the repetitive elements 
[41,43], our finding implies to an intrinsic link between 
the transcription elongation and the DNA modification 
processes. Interestingly, an enrichment of non-CG methy- 
lation on the antisense strand has been observed in ESCs 
previously [29] . Regarding the establishment of this strand 
bias, one possibility is that the antisense strand acts as the 
template for mRNA transcription; the non-template 
strand might thus be more accessible to TET proteins in 
some way, resulting in the enrichment of hmCG and, 
hence, the accumulation of mCG and non-CG methyla- 
tion on the template strand. The asymmetry distribution 
of DNA hydromethylation and methylation might also be 
interpreted in brain cells to play some regulatory roles in 
transcription, which should need further functional studies 
in the future. 

Third, we demonstrated that, in the adult human 
brain, 13.4% of all CpGs (k = 5,692,354) are highly- 
hydroxymethylated (the hmC'^'^'^ category). Since hydro- 
xymethylation is the main modification status of these 
CpG sites, it points to a regulatory role of these CpG sites 
when hydroxymethylated. Indeed, we found that this CpG 
category is strongly enriched at enhancers and genie re- 
gions, which is distinct from the highly-methylated 
(mC'^'*'^), as well as the lowly-hydroxymethylated (hmC'°", 
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data not shown) categories. The hm&^ is also more 
enriched at poised enhancers, which is distinct from en- 
richment of modC'°" at active enhancers. Finally, we 
showed that the hmC level is greatly reduced in 
H3K27me3-marked regions, which is different from its en- 
richment on the H3K27me3-associated bivalent promoters 
in ESCs [5-7]. This difference should be associated with the 
dramatic expansion of H3K27me3 in differentiated cells 
and tissues including the brain [27]. It will be of interest to 
investigate how these highly-hydroxymethylated CpG sites 
might contribute to the regulation of the unique chromatin 
organization in the brain tissues. 

In summary, we present the genome-wide and single- 
base-resolution maps of hmC and mC in the human 
brain. The results imply novel roles for hmC in gene 
splicing and gene expression regulation. The identified 
highly-hydroxymethylated CpGs, which comprise 13.4% 
of all CpGs in human genome and are strongly enriched 
at poised enhancers and actively transcribed regions, 
also serve as a starting point for future investigations on 
the mechanisms that establish hydroxymethylation pat- 
terns and their associated functions in the human brain. 

Materials and methods 

Biological samples and genomic DNA extraction 

This project was approved by the Reproductive Ethics Com- 
mittee of Peking University Third Hospital (2012SZ010). 
Brain tissues for research were obtained with the written 
informed consent. The methods applied in this study are 
complied with the Helsinki Declaration. For TAB-Seq and 
BS-Seq, the adult human brain samples were obtained from 
a postmortem 42-year-old woman, and the fetal brain sam- 
ples were obtained from an aborted 22-week-old male fetus. 
For LC-MS/MS, another adult human brain sample was ob- 
tained from a postmortem 28-year-old man, and another 
fetal sample was obtained from an aborted 22-week-old fe- 
male fetus. All individuals had no sign of brain disea- 
ses. Genomic DNAs were extracted using Blood & Cell 
Culture DNA Kits (Qiagen) following the manufacturer's 
instructions. 

Preparation of genomic DNA for TAB-Seq 

For TAB-Seq, genomic DNAs and spike-in control 
DNAs were sheared to an average size of 200 bp using a 
Covaris S2 instrument. Glycosylation and oxidation of 
genomic DNAs was performed following a previous 
protocol with small modifications [44]. Briefly, 5 [ig of 
sheared genomic DNA or with spike-in controls was ini- 
tially glycosylated using |3-glucosyltransferase proteins 
that were expressed and purified as previously described. 
Then, the DNA was purified using the QIAquick Nu- 
cleotide Removal Kit (Qiagen). Next, the oxidation reac- 
tion was performed using 1.5 \ig of glycosylated DNA 
and 30 |iL recombinant mTetl protein in a 150 ^iL 



reaction solution and incubated for 1 h at 37°C. After 
proteinase K treatment, the oxidized DNA was first 
purified with Micro Bio-Spin 30 Columns (Bio-Rad) and 
then with 1.8 x Ampure XP Beads (Beckman) following 
the manufacturer's suggestions. 

For Tet-assisted reduced representation bisulfite se- 
quencing (TA-RRBS), genomic DNAs extracted from 
purified neuron nucleus were first digested with Mspl for 
3 h at 37°C and purified using the QIAquick Nucleotide 
Removal Kit as described [45]. Then, glycosylation and 
oxidation of the DNA was performed similar to the 
TAB-Seq. 

Library preparation for TAB-Seq, BS-Seq, and sequencing 

The DNA sample for BS-Seq was the same batch of 
sheared genomic DNA with spike-in controls that was 
used for TAB-Seq but without glycosylation and oxidation 
preparation. DNA samples for TAB-Seq and BS-Seq (0.5 to 
1 |ig) were end-repaired, A-tailed, and ligated to methyl- 
ated adaptors using lUumina TruSeq Sample Preparation 
Kits following the manufacturer's instructions. The ligated 
fragments were then bisulfite converted using the Methyl- 
Code Kit (Invitrogen). Bisulfite-treated DNA was separated 
into three reactions for PGR amplification using PfuTurbo 
Cx Hotstart DNA polymerase (Agilent Technologies) for 
seven cycles, yielding three independent libraries for the 
same biological sample. Final libraries were purified with 
Ix AMPure XP beads and sequenced using an lUumina 
Hiseq 2000. For Tet-assisted reduced representation bisul- 
fite sequencing, an additional size selection of 180 to 
600 bp fragments on a 2% agarose gel was performed after 
the adapter ligation. 

Spike-in controls for TAB-Seq, TARRBS, and BS-Seq 

Fully methylated Lambda DNA was added to the human 
brain genomic DNAs in a ratio of 1 to 200 as a spike-in 
DNA control to calculate the non-conversion rates of 
unmodified cytosine and mC. For the spike-in control to 
calculate the non-conversion rates of hmC, a DNA frag- 
ment of approximately 1.6 kb was amplified from a 
pUC19 vector using a primer pair (5'-GCAGATTG 
TACTGAGAGTGC-3' and 5'-TGCTGATAAATCTG 
GAGCCG-3') and a cocktail of dATP/dGTP/dTTP and 
dhmCTP (Zymo Research, Cat. No. D1045). The control 
was then added to genomic DNA at a ratio of 1 to 400. 

RNA preparation and sequencing 

Total RNA was extracted from the tissue sample using 
QIAGEN RNeasy Mini Kits (Qiagen), and mRNA was 
isolated by Sera-Mag Oligo (dT) beads (Thermo Scien- 
tific). Libraries were prepared using the NEB Next 
mRNA Sample PreP Master Mix Set 1 according to the 
manufacturer's protocol and sequenced using an lUu- 
mina Hiseq 2000. 
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Quantitative analysis of hmC using LC-MS/IViS 

Genomic DNA (2 |ig) was digested by nuclease PI, venom 
phosphodiesterase I (Type VI), and alkaline phosphatase 
(Sigma). After desalting and filtration, 10 i^L of the solution 
was injected into HPLC-MS/MS for analysis. HPLC-MS/ 
MS was carried out by reverse-phase ultra-performance li- 
quid chromatography on an Agilent ZORBAX Eclipse 
XDB-C18 column (Rapid Resolution HT, 50 x 2.1 mm P.N. 
927700-902), equipped with a ZORBAX Eclipse XDB-C8 
guard column (Column: P.N. 821125-926, Cartridges P.N. 
820555-901), eluted with buffer A (0.1% formic acid in 
H20) and buffer B (0.1% formic acid in methanol) with a 
flow rate of 0.5 mL min-1 at 35°C with a 2% to 25% gradi- 
ent in 4.5 min, with online mass spectrometry detection 
using Agilent 6410 triple-quadrupole (QQQ) LC mass 
spectrometer in multiple reaction monitoring (MRM) posi- 
tive electrospray ionization (ESI) mode. The nucleosides 
were quantified using the nucleoside-to-base ion mass 
transitions of 258 to 142 with collision energy of 1 eV 
(hmC) and 228 to 112 with collision energy of 5 eV (C). 
Quantification and detection limits were determined by 
comparison with the standard curves obtained from nu- 
cleoside standards running at the same volume and time. 

Neuronal nuclei isolation 

Nuclei extraction was processed as described [46]. 
Briefly, 500 mg of cryopreserved human left frontal cor- 
tex was homogenized on ice by douncing in 5 mL lysis 
buffer (0.32 M Sucrose, 5 mM CaC12, 3 mM, 0.1 mM 
EDTA, 10 mM Tris-HCl (pH 8.0), 1 mM DTT, and 0.1% 
TritonX-100). Homogenates were transferred to ultra- 
centrifuge tube, with Sucrose Solution (1.8 M Sucrose, 
3 mM Mg(Ac)2, 1 mM DTT, and 10 mM Tris-HCl 
(pH 8.0)) carefully pipetted at the bottom of the tube to 
form a concentration gradient. Ultracentrifugation was 
performed at 100,000 g for 2.5 h at 4°C. After centrifuge, 
supernatant including debris was removed and the pellet 
was incubated in 0.5 mL cold PBS on ice for 20 min be- 
fore thoroughly triturated by pipetting. 

Immunostaining mix was prepared by combining 
300 \iL PBS, 1.2 ^g NeuN antibody (Millipore, MAB377), 
200 |iL Blocking Mix ( 2.5% BSA and 10% Goat Serum in 
PBS), and 2 |ig of Alexa Fluor 488 conjugated secondary 
antibody (Cell Signaling, #4488) together and rotated for 
5 min at room temperature in the dark. An isotype anti- 
body control was processed in parallel by adding the same 
amount of mouse IgG instead of NeuN antibody. Then 
1 mL nuclei suspension was added, and the mixture was 
incubated by rotating in the cold room for 45 min. After 
incubation, samples were retrieved and stained with 
Hoechst 33342 for another 2 min. The immunostaining 
result was checked under the microscope. 

Immumo tagged samples were diluted 10 times in PBS 
and filtered through a 40 |im filter before loaded to the 



FACS machine. A preliminary run was performed to 
gate out the proper nuclei size and the fluorescence 
bright population before the sort which separate the 
NeuN + nuclei. Once the sort was done, a small amount 
of sorted sample was run again through the instrument 
to confirm the purity. 

After FACS, PBS was added to raise the volume of the 
sort to 10 mL. Then the sorted sample was mixed with 
2 mL Sucrose Solution, 50 ^L CaC12 (1 M), and 30 ^iL Mg 
(Ac)2 (1 M), and incubated on ice for 15 min before cen- 
trifuged at 1,786 g for 15 min at 4°C. The NeuN + nuclei 
pellet was resuspended in PBS and went on with the Tet- 
assisted reduced representative bisulfite sequencing. 

Data processing 

FASTQ format reads generated by the lUumina HiSeq2000 
platform were aligned to the human reference sequence 
(HG19) using the Bismark program. Briefly, first, whole or 
any subsets of adaptor sequences were trimmed on 3 ' and 
5 ' of reads before alignment. Second, a read was removed 
if more than 10% bases were N, or more than 50% bases 
were of Phred quality lower than 5, or at least three 
unmethylated CHs were present, or PCR redundancy oc- 
curred, so that only high quality data were used for down- 
stream analysis, as described previously [17,29]. Third, 
cytosines in a read were computationally replaced with 
thymines and then mapped to computationally converted 
HG19 references using the Bismark program. The Lambda 
genome and the pUC19 sequences for the hmC spike-in 
control were also included in the reference sequence as 
extra chromosomes for assessing the non-conversion rates 
of unmodified cytosine, mC, and hmC. 

Assessing the non-conversion rates of unmodified 
cytosine, mC, and hmC and the protection rate of hmC 

The non-conversion rate of unmodified cytosine was calcu- 
lated as the percentage of sequenced cytosines in non-CG 
contexts relative to all covered cytosines in non-CG con- 
texts in the Lambda genome. The non-conversion rate of 
mC was calculated as the value in a CG context, and the 
non-conversion rate of hmC was measured as the value for 
all cytosines in the spike-in pUC19 sequences. The original 
data are shown in Additional file 1: Table SI. The normal- 
ized protection rate of hmC was assessed by dividing the 
non-conversion rate of hmC of TAB-Seq by that of BS-Seq. 

Assessing the false discovery rates of modC and hmC 

The P value for each cytosine detected by TAB-Seq and 
BS-Seq was calculated via binomial distribution as previ- 
ously described [17,29]. To measure a false discovery 
rate (FDR) for the modC and hmC sites passing a given 
P value cutoff, we applied the Benjamini-Hochberg 
method [47] and set the FDR <0.01. The FDR was calcu- 
lated separately for each chromosome and cytosine in 
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each of the CG, CHG, and CHH contexts. We also ap- 
phed the method described previously [17] to calculate 
the FDR for the TAB-Seq data and found that although 
this method yielded similar results for the fetal TAB-Seq 
data and for cytosines in the CHG and CHH contexts in 
the aduh TAB-Seq data with a FDR <0.05, the FDR was 
higher than 0.1 even at a P value of lE-8 when applying 
it to the adult TAB-Seq data. This is likely because the 
number of bona fide hydroxymethylated cytosines of the 
adult brain is much greater than those of ESCs and the 
fetal brain. To be consistent, we applied the Benjamini- 
Hochberg method for both the adult and fetal TAB-Seq 
data. For calculating the FDR for the BS-Seq data, the 
Benjamini-Hochberg method yielded a similar result 
with the method described by Lister et al. [29] . 

Modification frequency of individual CpG sites 

The mode frequency was calculated as dividing modCG 
by the total coverage according to the BS-Seq data 
(modCG/CG). The hmCG frequency was similarly calcu- 
lated according to the TAB-Seq data (hmCG/CG), and 
the mCG frequency was calculated by subtracting 
hmCG/CG from modCG/CG. 

Quantification of the hmC and mC levels 

The hmC level (%hmC) at a given genomic region or site 
is calculated by dividing the number of sequenced cyto- 
sines by the number of sequenced cytosines plus thymines 
at this region or site according to the TAB-Seq data, where 
the reference is in CG context. The mC level (%mC) is cal- 
culated by subtracting %hmC from the total modification 
level (%(hmC + mC)) according to the BS-Seq data. 

Genome annotation 

All genomic features were defined based on the HG19 
genomic annotation downloaded from the UCSC data- 
base. Different genie elements, including transcription 
start sites (TSS), exons, introns, and transcription ter- 
minal sites (TTS), were defined based on the Reference 
Sequence (RefSeq) database. The CGIs were retrieved 
from the cpglslandExt table in UCSC database, and pro- 
moter CGIs were defined as overlapping with + 1 kb of 
TSS. Repetitive sequences (SINE, LINE, LTR, and Major 
Satellite) were acquired from the Repeat-Masker track in 
the UCSC database. 

Calculating enrichment of CpG categories on genomic 
elements 

For a given CpG group (modC'°^ mC*''^'', hmC'^'^'", or 
Fetal > Adult hmCGs), we counted the fraction of the in- 
volved CpG sites at each genomic element as the 'ob- 
served' distribution value. We also counted the fraction 
of the overall captured genomic CpG sites at each gen- 
omic element as the 'expected' distribution value. The 



fold enrichment value for each genomic element was 
calculated by dividing the 'observed' distribution value 
by the 'expected' distribution value. 

Profiling hmC and mC at the exon-intron boundaries 

A total of 176,455 internal exons representing 18,606 
genes were retrieved from the RefSeq database, with ex- 
clusion of all first last exons and single-exon genes. A total 
of 12,980 first exons were also retrieved. The average 
hmC, mC, and modC levels, the hmC/mC ratio, the num- 
ber of four modCG groups relative to all CpGs, and the 
CpG sites were calculated for each base ±150 bp around 
the exon-intron boundaries. The base composition was 
also measured ± 10 bp around the 5 ' and 3 ' splicing sites. 

Calculation of exon inclusion rates 

To identify exons that are alternatively spliced, we followed 
the method described previously [14] with modifications. 
First, a library of exon-exon junction (EEJ) sequences that 
comprised all possible exon-exon combination for a gene 
was generated. Second, RNA-Seq data generated in this 
study were merged with RNA-Seq data of human cerebral 
cortex generated previously [48], and then were aligned to 
the EEJ library using the Bowtie program. Reads mapping to 
the genomic sequences were discarded before alignment. To 
determine an EEJ, at least eight mapped nucleotides were 
needed for each of the two exons. Then, the generated EEJ 
data were used for calculation of the inclusion value of an 
exon as '% inclusion = 100 x (sum(CiA) + sum(ACj))/((sum 
(ClA) + sum(ACj) +2 x (sum(CiC2) + sum(ClCj))); where 
Ci is any possible splicing donor upstream the alternative 
exon, CI is the first splicing donor upstream the alternative 
exon, Cj is any possible splicing acceptor downstream the al- 
ternative exon, C2 is the first splicing acceptor downstream 
the alternative exon, and A indicates the examined exon. A 
minimum of 10 supporting reads are required and exons 
without a AG dinucleotides at the 3 'splicing sites are omit- 
ted. An alternatively spliced exon was defined when the in- 
clusion value was less than 0.8. 

Strand-biased hmCG and mCG profiles on gene bodies 

Gene body regions between TSS and TTS were divided 
into 100 equally sized bins, and average hmC and mC levels 
were calculated for each bin as well as in 100-bp windows 
throughout 10 kb upstream and downstream of the gene 
body. The sense strand and the antisense strand were ex- 
amined separately. A two-tailed and paired Student's i-Test 
was applied to determine whether the sense and antisense 
strand were significandy different from each other. 

Identification of the sense-antisense (SAS) gene paired 
region 

All expressed genes (RPKM >1) were divided into two 
groups according to whether they are transcribed along 
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the positive strand or the negative strand. The SAS gene 
paired region was identified as the overlap region be- 
tween these two gene groups. 

Identification of cell type-specific genes and 
house-keeping genes 

Genes enriched in different brain cell types including 
neurons, astrocytes, and oligodendrocytes were defined 
according to a previous paper with exclusion of genes 
that are not expressed in our RNA-Seq data [34]. The 
house-keeping genes were defined according a previous 
publication [35]. 

Association of hmC and mC profiles with ChlP-Seq data 

To profile the distribution of hmC and mC on genomic re- 
gions with different histone modifications, we used the re- 
centiy published genome-wide maps of chromatin states in 
the adult brain midfrontal lobe [27], including H3K4mel, 
H3K27ac, H3K4me3, H3K27me3, and H3K9me3. 

Enhancers were defined as H3K4mel peaks that are dis- 
tant from ± 2.5 kb around any TSS in the switchDbTss 
table downloaded from the UCSC database. To produce 
the profiles in Figure 4A, we selected enhancers within in- 
tragenic regions, which account for approximately two- 
thirds of enhancers in the adult human brain [9]. For CGI 
shores, we selected promoter CGIs. We checked a region 
± 5 kb around the center of the H3K4mel peak and 3 kb 
upstream and downstream of CGIs. For all profiling, the 
average hmC, mC, and modC levels, the hmC/mC ratio, 
and the number of the four modCG groups (hmC'^'^'^, 
mC'^'sh ^f^ii^ modC'°") relative to all CpGs were 
binned into 50-bp sliding windows in 25-bp steps. 

To profile hmC and mC in H3K27me3- and 
H3K9me3-marked regions, each region was divided into 
100 equally sized bins, and average hmC, mC, and 
mode levels and the hmC/mC ratio were calculated for 
each bin, as well as in 1-kb windows throughout 10 kb 
upstream and downstream of the regions. 

Accession numbers 

The TAB-Seq and BS-Seq data have been deposited to 
the NCBI under accession number GSE46710, and the 
TA-RRBS data have been deposited under accession 
number GSE55249. 

Additional files 



Additional file 1: Table SI. Summary of sequencing details. 

Additional file 2: Figure SI. Quantitative values of hmC relative to dC 
measured by LC-MS/MS. LC-MS/M5 was performed to genomic DNAs 
isolated from several regions of two adult brain and two fetal brain 
samples. For each sample, the average value with the standard deviation 
from technical duplicates was shown. 



Additional file 3: Figure S2. Chromatograms of LGMS/MS. The LC-MS/ 
MS chromatograms of hmC (left panels) and C (right panels) for genomic 
DNAs extracted from the adult (a) and fetal (b) frontal lobes were shown 
with the standard curve (c). The peak area counts are marked. Please go 
to the figshare website [49] for all the raw chromatograms. 

Additional file 4: Figure S3. The percentages of hmC (by TAB-Seq) or 
mode (by BS-Seq) in the fetal hippocampus in the contexts of CG, CHH, 
and CHG. 

Additional file 5: Figure S4. Features of hydroxymethylome in the 
human brain, (a, b) Distribution of all covered CpGs according to their 
hydroxymethylation and methylation frequencies in the adult (a) and the 
fetal (b) brains, (c) Distribution of the CpG categories (modC'™, mC^'^^, 
and hmC^''^^) and all captured CpGs (All CpGs) on different genomic 
elements, (d) Average absolute levels of hmC at different genomic 
elements in the fetal brain. 

Additional file 6: Figure S5. Prominent hmC changes at the exon-intron 
boundaries in the human brain. Profiles of hmC and mC for a 200-bp 
window (50 bp for exon and 150 bp for intron) around the exon-intron and 
intron-exon boundaries. IVIodification levels of hmC, mC, total DNA methylation 
(hmC + hmC), and the ratio of hmC to mC are shown for all internal exons 
(n = 1 76,455) in the sense strand. 

Additional file 7: Table S2. Information of 18,036 exons having CpG 
sites at 5' splicing sites. 

Additional file 8: Figure S6. Profiles of hmC and mC at the exon-intron 
boundary of exons which have a CpG at 5'ss position +4 or +5. Modification 
levels of hmC, mC, total DNA methylation (hmC + hmC) were shown for a 
40-bp window around the exon-intron boundaries at single-nucleotide 
resolution of two types of exons, which have a CpG at 5'ss position +4 
or +5, and are named -F4CG and -F5CG exons, respectively. Since a CpG at 
one position will lead to absence of CpG at the nearest neighboring 
position and thus no methylation value, we merged the data of the sense 
and the antisense strands for each type of exons. 

Additional file 9: Table S3. All expressed genes in the adult human 
brain revealed by RNA-Seq. 

Additional file 10: Table S4. hmC, mC, and modC levels on each 
strand of all individual genes. 

Additional file 1 1 : Figure S7. The average levels of modC on sense and 
antisense strands of genes expressed at different levels in the adult brain. 
One-tailed paired Student's ttest. nd, no statistical difference (P >0.01). 

Additional file 12: Figure S8. The mC profiles across each strand of 
the exon. The profile across exons of sense (lined) and antisense (dotted) 
strands of highly-expressed genes (red) and no-expression genes (black) 
in the adult brain showed that a transcription-correlated mC bias toward 
the antisense strand of the exon. 

Additional file 13: Figure S9. The mC profiles across each strand of 
the gene body in the mouse brain. The profile across genes of the sense 
(lined) and antisense (dotted) strands of expressed genes (n = 1 1,424, red) 
and genes with no expression (n = 5,203, black) in the adult mouse brain 
showed that a transcription-correlated mC bias toward the antisense 
strand of the gene body in the mouse brain. The TAB-Seq, BS-Seq, and 
RNA-Seq data for analysis were obtained from Lister ef al. [25]. 

Additional file 14: Figure S10. FACS scatter plot for the control 
sample for isolation of neuronal nuclei, which was processed with IgG. 
Additional file 15: Figure S11. Distribution of modC'°", mC^'^h^ 
hmC^'^^ surrounding the midpoints of active or poised enhancers. 

Additional file 16: Figure S12. hmC and mC maps in a 2-mb genomic 
region on chromosome 1 1. This example shows depletion of hmC and 
enrichment of mC on the H3K9me3- and H3K27me3-marked repressive 
regions. ChlP-Seq data for H3K4mel, H3K4me3, H3K9me3, and 

H3K27me3 were obtained from Zhu ef al. [26]. 

Additional file 17: Figure SI 3. hmC and mC maps in a 2.8-mb 
genomic region surrounding the NRXNI gene. This example shows 
enrichment of hmC and H3K4mel within the genie region and 
enrichment of mC and H3K27me3 in the neighboring intergenic regions. 
ChlP-Seq data for H3K4mel, H3K4me3, H3K9me3, and H3K27me3 were 
obtained from Zhu ef al. [26]. 
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